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ABSTRACT 



Item scores that do not fit an assumed item response theory 
model may cause the latent trait value to be estimated inaccurately. For 
computerized adaptive tests (CAT) with dichotomous items, several person-fit 
statistics for detecting nonfitti.ng item score patterns have been proposed. 
Both for paper-and-pencil (P&P) test and CATs, detection of person misfit 
with polytomous items has hardly been explored. In this simulation study, the 
theoretical and empirical null distributions of a person-fit statistic for 
polytomous items are compared for P&P tests and CATs. Results show that the 
empirical distribution of this statistic was close to the standard normal 
distribution, for both P&P tests and CATs. Also statistics that are 
especially designed for a CAT are proposed. In these statistics observed and 
expected item scales are compared using cumulative sum (CUSUM) procedures. 
Results show that the critical values of the CUSUM were symmetric around zero 
and similar across latent trait values. Moreover, the results show that for 
the CUSUM procedure fixed critical values for all examinees can be used. 
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Abstract 



Item scores that do not fit an assumed item response theory model may cause 
the latent trait value to be inaccurately estimated. For computerized adaptive 
tests (CAT) with dichotomous items, several person-fit statistics for detecting 
nonfitting item score patterns have been proposed. Both for paper-and-pencil 
(P«&P) tests and CATs, detection of person misfit with polytomous items is hardly 
explored. In this study, the theoretical and empirical null distributions of a 
person-fit statistic for polytomous items are compared for P&P tests and CATs. 
Results showed that the empirical distribution of this statistic was close to the 
standard normal distribution, for both P&P tests and CATs. Also, statistics that 
are especially designed for a CAT are proposed. In these statistics observed and 
expected item scores are compared using cumulative sum (CUSUM) procedures. 
Results showed that the critical values of the CUSUM were symmetric around 
zero and similar across latent trait values. Moreover, the results showed that for 
the CUSUM procedure fixed critical values for all examinees can be used. 

Key words -, appropriateness measurement, computer adaptive testing, 
cumulative sum, item response theory, person fit, polytomous item response 
models. 
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Detection of Person Misfit in Computerized Adaptive Tests with 

Polytomous Items 

The aim of a computerized adaptive test (CAT) is to construct an optimal test for 
each examinee. This is realized by estimating the examinee’s ability-level ( 6 ) after 
administration of each item and selecting the next item based on the current ability 
estimate ( 9 ). The 0-estimation procedure, the item selection procedure and the stopping 
mle of a CAT are all based on the assumption that the item scores of an examinee fit 
the assumed item response theory (IRT) model. It is questionable, however, whether the 
assumed IRT model gives a good description for each examinee’s test behavior. For those 
examinees for whom this is not the case, the ability estimate as a measure of true 6 may be 
inadequate, and as a result the construction of an optimal test may be difficult. There are 
all sorts of causes that may invalidate 0. For example, knowledge of the correct answers 
due to test preview on achievement tests, faking on biodata questionnaires or personality 
tests, randomly guessing on all items in the test in order to become familiar with the 
questions, or lack of motivation in item-pretesting situations. To detect examinees with 
invalid 0, person-fit statistics have been proposed. 

Most person-fit research has been conducted for paper-and-pencil (P&P) tests with 
dichotomously scored items (e.g., Drasgow, Levine, & Williams, 1985; Meijer, 1994; 
Tatsuoka, 1984). Recently, some studies investigated the assessment of person fit in 
CATs with dichotomously scored items (e.g., Nering, 1997; van Krimpen-Stoop & Meijer, 
1999b, 1999c). However, there are no studies known to the authors that deal with person- 
fit research in a CAT with polytomously scored items. This may be explained by the fact 
that only a few studies investigated the use and implementation of CATs with polytomous 
items (see Dodd, De Ayala, & Koch, 1995, for an overview). In the present study we 
will investigate the use of person-fit statistics for polytomously scored CATs. Also, some 
results for person-fit statistics in P&P tests with polytomous items are discussed. 

This study is organized as follows. First, a short overview of item response theory 
for polytomous items with ordered score categories is given. Second, a short overview of 
research in the context of CAT with polytomous items is given. Third, existing person- 
fit statistics that are designed for polytomous items in P&P tests are described and new 
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statistics that can be used in a CAT and that are based on theory from Statistical Process 
Control are proposed. Fourth, simulation studies are conducted in which the theoretical 
and empirical distributions of an existing person-fit statistic are compared both for P&P 
tests and for CAT. Finally, a simulation study investigating the critical values of the newly 
proposed statistics is conducted. 

Polytomous Item Response Models 

Models for imidimensional polytomous items with ordered score categories are 
considered here; that is, models in which the item responses are scored into more than two 
ordered categories. Examples of such items are Likert-type attitude items or achievement 
items with partially correct scoring. Let X{ be the realization of Xi, the score on item i and 
letx = (xj, ..., xn) denote the observed score pattern on an N-item test. Furthermore, let 
the responses to item i be categorized into m -1- 1 ordered score categories j = 0, 1, ..., m 
where higher scores reflect a higher 9 level. 

According to Mellenbergh (1995; see also, Molenaar, 1983), three families of models 
for ordered polytomous items can be distinguished where the distinction between these 
models is based on three different methods to split an ordinal polytomous response 
variable into a set of dichotomies (see Agresti, 1990). In all three methods a polytomous 
response variable with m -1- 1 categories is split into m dichotomies. The models in the 
first family are called the adjacent-category models, where the m -1- 1 ordinal response 
variable is split into m adjacent-category pairs. The probability of obtaining a score j is 
determined conditional on obtaining a score j — \oxj\ P{Xi = j\Xi = j — 1 V = j). 
Examples of such models are the partial credit model (PCM; Masters 1982), the . rating 
scale PCM (Andrich, 1978), and the generalized PCM (Muraki,1992). 

The second family consists of the cumulative-probability models, where the m -1-^ 1 
ordinal response variable is split into m cumulative probabilities. Here, the probability 
is determined of obtaining a score in category j or higher: P {Xi > j). Examples of 
such models are the graded response model (Samejima,1969) and the rating scale graded 
response model (Muraki, 1990). 

The third family consists of the continuation-ratio models, where the m -1- 1 
ordinal variable is split into m continuation ratios, and the probability of obtaining 
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a score j or higher, conditional on obtaining a score j — \ or higher is of interest: 
P ^ J ~ !)• An example is the sequential model (Tutz, 1990). For recent 

developments in polytomous item score models, see, for example Hemker (1996) and 
Akkermans (1998). 

Let Pij (0) denote the probability that an examinee with ability 0 obtains a score j on 
item i. Although the present study is restricted to the PCM (Masters, 1982), the theory 
and applications discussed can easily be generalized to other ordinal polytomous models. 
In the PCM the item parameters Sik for k — 1, ...,m, are often described as item-step 
difficulties where 6 ik is the point on the 0-axes where the probabilites of obtaining score 
/c and fc - 1 intersect (i.e., Pik (0) = Pi,k-i (0))- Let Si = (0ii, ..., Sim) denote the vector 
with the individual step difficulties for item i and let S = ( 61 , 62 , ■■■,Su) denote the vector 
of vectors 6 i. The probability of scoring Xi — Xi on item i conditional on 0, according to 
the PCM (Masters, 1982) is defined as 



Adaptive Ibsting and Polytomously Scored Items 

Most CAT research has been conducted for dichotomous items. The few studies that have 
used polytomous items used the PCM, the graded response model (Samejima, 1969), the 
nominal response model (Bock, 1972,), the rating scale model (Andrich, 1978), or the 
successive intervals model (Rost, 1988). 

Polytomous CAT research investigated characteristics of the item pool, the item 
selection criterion, the 0-estimation procedure, and the stopping rule. Interesting results 
were, for example, that compared to CAT with dichotomous item scores, the size of the 
item pool may be substantially smaller to get an accurate estimate of 0 (see e.g., Dodd, 
Koch, & De Ayala, 1993, and Koch & Dodd, 1989). Item pools should be not too small, 
however, in order to secure, for example, content validity. 

As in dichotomous CAT, item selection in polytomous CAT is often based on 
maximum item information and in most cases maximum likelihood estimation is used for 




( 1 ) 



m 



such that Pij (0) = 1 and Sio = 0. 
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estimating 6 . However, the maximum likelihood estimate can only be determined when 
the response to the first item is not in the lowest or the highest category of the item. An 
alternative may be to use Warm’s (1989) estimation procedure (e.g. van Krimpen-Stoop 
& Meijer, 1 999c). 

When the maximum likelihood estimate is determined after the first response, the 
estimate will be very unstable with a high standard error. To overcome this problem, in 
most cases a systematic procedure to estimate 6 is used, until item scores in two different 
item categories are observed, and after this, maximum likelihood is used to estimate 0 
(see e.g., Koch & Dodd, 1989, and Dodd, Koch, & De /^ala, 1989). One systematic 
procedure is the fixed stepsize procedure, in which the new preliminary estimate of 6 
is increased/decreased by a constant when the response was in the upper/lower half of 
the response scale. Also, the variable stepsize procedure can be used, where the new 
preliminary estimate of 6 is increased/decreased by half times the highest/lowest step 
difficulty when the response was in the upper/lower half of the response scale. Research 
showed that the use of variable stepsize leads to better results compared with fixed 
stepsize, in terms of fewer cases of nonconveigence of the 0 estimate (see e.g., Koch 
& Dodd, 1989). 

For the stopping rule, a number of alternatives can be used. The test can be stopped 
when a certain number of items has been administered (fixed test length), when the 
accuracy in the estimation of 9 is within a prespecified standard error of 9 (standard error 
rule), or when there are no items available in the item pool that have a minimum level 
of information conditional on the current estimate of 9 (minimum information rule). For 
a comparison of the minimum information stopping rule and the standard error stopping 
rule see, for example, Dodd et al. (1989). 

Person-Fit Analysis 

In person-fit analysis the fit of an individual item score pattern is investigated to detect 
misfitting item score patterns. In the few studies in which person fit was applied in the 
context of polytomous items, the polytomous items were dichotomized and person-fit 
statistics were used for dichotomous item scores (see e.g., Zickar & Drasgow, 1996). 
A disadvantage is that part of the information contained in the polytomous item is lost. 
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because each pair of adjacent categories of the polytomous item can be seen as a single 
dichotomous item (see e.g., Molenaar, 1983). By dichotomizing the scores the length 
of the tests is actually decreased, which is unfavorable for the assessment of person fit 
(Reise & Due, 1991). Also, compared with the dichotomized version of the item, the 
item information function of a polytomously scored item is higher at the peak of the 
function and the information is also distributed across a wider range of 9, which may also 
enhance the assessment of person fit (Reise & Due, 1991). Finally, the dichotomization 
that is chosen for a polytomous score variable has a substantial effect on the measurement 
outcome 9, unless specific conditions on the item parameters hold (Jansen & Roskam, 
1986, Roskam & Jansen, 1989). 

For tests with dichotomous or polytomous items, misfitting item score patterns consist 
of many incorrect scores to easy or unpopular items and many correct scores to difficult 
or popular items (e.g., Meijer & Sijtsma, 1999). In the dichotomous case, Xi is 0 or 1, 
the expected score E (Xi|^) of item i equals the probability of a correct response, and a 
weighted function of the residual 



f{Xi-E{Xi\9)), (2) 

is used to determine person fit. Also for polytomous items, the expected score on item i 
can be determined and observed and expected scores can be compared. Because in this 
paper Pij{9) is defined as the probability of obtaining score j on item i, the expected 
score E (Xi|^), according to the general definition of the expectation (see e.g., Lindgren, 
Chapter 4), can be written as 

m 

E{Xi\9) = J2jPiji^), 

i=o 

andXi e {0, 

Existing Person-Fit Statistics 

An often used person-fit statistic for dichotomously scored items is the log-likelihood 
statistic I (Levine & Rubin, 1979, Drasgow, Levine & Williams, 1985). Drasgow, Levine 
& Williams (1985) also proposed a standardized log-likelihood statistic for polytomous 
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items. Assuming local independence between all items, the likelihood of score pattern x, 
can be written as 

N 

L(x|0,5) = (0), 

i=l 

and the log-likelihood I is defined as the natural logarithm of L and can be written as 

N 

Z (x |0, 5 ) = In [L (x |0, 5 )] = 5^ In {6) . 

i-l 

Because I is dependent on 9, Drasgow et al. (1985) proposed to use the standardized 
version of I, denoted as Iz'. 

l{:x.\e,6) -E{l\e) 



h {^\e,6) = 



[var 



where E{1\6) denotes the expected value of I 

N m 



i=l j=0 



and var (Z|0) the variance of I 

N 



var{l\e) = 



1=1 



mm p 

^ Py (9) Pa (9) In P„ (9) In 

. J=0 /i=0 



In practice, 6 is imknown and 6 should be used to determine Iz- Large negative values 
of Iz indicate a low probability of obtaining score pattern x ; thus, large negative values 
of Iz indicate misfitting item score patterns. Drasgow et al. (1985) found that for P&P 
tests the empirical distribution of Iz using 9 was reasonably close to the standard normal 
distribution for long tests (tests with more than 80 items). 

Another person-fit statistic that can be used for polytomous items was proposed by 
Wright & Masters (1982). Wright & Masters (1982) proposed to use the standardized 
weighted mean squared residual 

Y:L {Xi-E{x,\e)f 

i:l,var{X,\e) ’ 



! 
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where a transformation of v was used to correct for kurtosis 




with 



m 



E{Xi\9) 






(3) 



m 



var (Xi|0) 



and 



(4) 



Q' 



,2 



T.L [(zr-o (j - g (^<1^))“ P‘i w) - 






Wright & Masters (1982, pp. 108-109) claim that t is standard normally distributed when 
the PCM holds. Some research has been conducted with this statistic using dichotomous 
data, where the PCM becomes the Rasch (1960) model and the statistic t is equivalent 
to the statistic proposed by Wright & Stone (1979, Chapter 4). For example, Rogers & 
Hattie (1987) showed that the empirical distribution was far off the expected theoretical 
distribution and, as a result, using critical values based on the theoretical distribution, t 
was insensitive to misfitting item score patterns. Also, Hoijtink (1986) showed that the 
distribution of the dichotomous version of t was far from standard normal in the case of 
the Rasch model. 

Cumulative Sum Procedures 

In a cumulative sum (CUSUM) procedure, originally proposed by Page (1954), sums of 
statistics are accumulated, but only if they exceed ’the goal value’ by more than d units. 
Let Zt be the value of a standard normally distributed statistic Z obtained from a sample 
of size n at time point t. Furthermore, let d be the reference value. Then, a two-sided 
CUSUM procedure can be written in terms of and C^, where 



= max [O, {Zt — d) + , and 

Ct = min [O, {Zt + d) + Qli] , 



with starting values Cq = Cq =0. Note that the sums are accumulating on both sides 
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concurrently. Thus, as soon as |Ztl > d, Zt values are accumulated in and C~ . Let h 
denote some threshold. The process is ’out-of-control’ when > h or C~ < —h and 
’in-control’ otherwise. 

One assumption imderlying the CUSUM procedure is that the Zt-values are 
asymptotically standard normally distributed; the values of d and h are based on this 
assumption. The value of d is usually selected as one-half of the mean shift (in Zt-units) 
one wishes to detect; for example, d = 0.5 is the appropriate choice for detecting a shift 
of one times the standard deviation of Zt. In practice, CUSUM-charts with d = 0.5 and 
/i = 4 or /i = 5 are often used (for a reference of the underlying rationale of this choice, 
see Montgomery, 1997, p.322). Setting these values for d and h results in a significance 
level of approximately a = 0.0027 (two-sided). Note that in person-fit research a is fixed 
and critical values are derived fi'om the null distribution of the statistic. In this study, we 
will also use a fixed a and will derive critical values fi'om simulations. 

Both van Krimpen-Stoop & Meijer (1999b) and Bradlow, Weiss, & Cho (1998) 
proposed to use statistical process control techniques to detect person misfit in a CAT. Van 
Krimpen-Stoop & Meijer (1999b) proposed statistics to be used in a CUSUM procedure 
to investigate person fit in an on-line application or after complete administration of a 
CAT with dichotomously scored items. These statistics were based on the responses to 
single items resulting in a sample size of 1 at each t. Because the theoretical distribution 
of these statistics is a Bernoulli distribution, and not a standard normal distribution, it 
was necessary to determine critical values to classify a score pattern as nonfitting by 
means of a simulation study. The critical values were foimd to be stable across 9 values. 
Van Krimpen-Stoop & Meijer (1999a) also proposed CUSUM-based statistics using the 
responses to disjoint subsets of items which resulted in a sample size of n > 1 at each t. 
These statistics followed a distribution that was close to the standard normal distribution 
when n was not too small (10 or more items in each disjoint subset). Thus for these 
statistics a theoretical distribution can be used to determine the critical values. Van 
Krimpen-Stoop & Meijer (1999a) found that the use of theoretically determined critical 
values resulted in empirical Type I errors that were close to the nominal ones. A limitation 
was, however, that the subsets of items should not be too small or too large. For detailed 
information see, van Krimpen-Stoop & Meijer (1999a). 
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CAT and CUSUM Procedures 

Sums of consecutive negative or positive residuals can be investigated using a CUSUM 
procedure. Let ik denote the kth item in the CAT; that is, k is the stage of the CAE Further, 
let the statistic Tk be a function of the residuals at stage k, N the final test length, and let, 
without loss of generality, the reference value d be equal to 0. For each examinee, at each 
stage A: of a CAT, the CUSUM procedure can be determined as 





- max [o, Tk + C^_i] , 


(5) 


Ck 


= min [O, Tk + , and 


(6) 




II 

II 

O 


(7) 



where C"*" and C~ are sensitive to series of positive and negative values of T*;, respectively. 
Let UB and LB be some appropriate upper and lower bound, respectively Then, when 
C'^ > UB OT C~ < LB the item score pattern can be classified as not fitting the model, 
otherwise, the item score pattern can be classified as fitting the model. 

In the polytomous case, Tk can be written as a function of the residuals as in Equation 
2. In Equation 2, the value of the statistic is determined given the true value of 6. In 
practice, however, this true value is unknown and as an alternative an estimate of 9 
can be used. In a CAT, two alternative estimates of 6 can be chosen. First, during 
administration of the test at each stage k, 6 is estimated based on the responses to the 
previous administered items (denoted as 6k-i) and this updated estimate can be used to 
compute the value of T. Second, the final estimate of 6 (denoted as 6^) can be used 
to compute T. An advantage of using the updated estimate 6k-i is that the fit can be 
investigated during test administration, although 9k-i may be more inaccurate than 6^. 
Due to the use of the final estimate 0/^^, the fit can no longer be investigated during the 
test, because 6/^ needs to be computed first and this is done at the end of the test. 

Statistics 

Two simple statistics are the unweighted residual between the observed and expected 
score, corrected for test length 
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and the weighted residual, corrected for test length and the variance of the item score i 

x,,-E{x,,\e) \ 

[var {Xi^\e)]^'^ \ ’ 

where E {Xi^ |^) and var {XiJ9) are defined in Equations 3 and 4, respectively. Note that 
all kinds of other functions of the residual can be taken. This study, however, is restricted 
to the statistics and T^. 

To determine upper and lower bounds in a CUSUM procedure it is assumed that the 
statistic computed at each stage is asymptotically standard normally distributed. However, 
the null distribution of and thus are far from standard normal: in the dichotomous 
case, follows a Bernoulli distribution with parameter Pk{9), and in the polytomous 
case, follows a multinomial distribution with m observations and parameter vector 
(Fi^i {6 ) , ..., Pi^jn {9)), where m is the highest ordered response category. As a result, 
setting d = 0.5 and the upper and lower bound to h, = 5 and h = —5, respectively, is not 
appropriate in this context. Therefore, in this study, the numerical values of the upper and 
lower bound are investigated through simulation, with for example a = 0.05 and d = 0. 
(See also van Krimpen-Stoop & Meijer, 1999a for similar research with dichotomous 
items). 

This study is limited to the use of statistics based on the responses to single items, 
thus a sample size of 1 at each time point. Constructing a substantial number of disjoint 
subsets of items of 10 or more in a polytomous CAT or P&P test is difficult, because the 
test length of a polytomous CAT is in general smaller than the length of a dichotomous 
CAT, due to higher information of the polytomously scored items (see e.g., De Ayala, 
1992). A disadvantage of the use of statistics based on responses to single items is the 
lack of theoretically determined critical values, and it is therefore necessary to determine 
critical values by means of a simulation study. 

Simulation Studies 




Purpose 

This simulation study was designed to investigate whether the empirical null distribution 
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oti nsing e was in ag^oment wifl. Ute standard normal dismbmion for shod polytomous 
Z J and CATS. Drasgow, Levine d. «Uiams, 1985 showed dtat for long P^P 
tests (80 items or more) the empirical distribution of I, m close to the stan ar no 
rlhon. However, it is tmhnown how well dtts dteoretica. disritbutton holds for 

shorter P&P tests and CATs. CUSUM 

Seamd, dte numerical values of the upper and lower thresholds of dte CUSW 

• • Ti and across 0-levels were examined. In the case that hes 

procedures for statistics and T .across y lev . rRranbeused 

critical values are similar across 0 values, in practice, one fixed UB and 
for all examinees. This eases the use of these statistics. 

Method 



Item Pool ivtrtmnus CAT an item pool consisting of 60 

To be consistent with earlier research on po yt . 

three-step .terns from Koch & Dodd (1989) that fit the PCM was used. In 

values of the item parameters are given. 

one 20-item test and one 30-item test. The 20-item 

Two P&P tests were constmeted, 

test was constructed using the first 20 items of item pool, whereas 

htc first 30 Items were used. For each test, six datasets of 1,000 response vectors were 

simulated. Five datasets were simulated at five difrerent d levels: 9 = - , . ^ ■ 

and 2. one dataset was simulated in which 1 , 000 9s were drawn ft™ h( W I 
simulation procedure was analogous to the procedure for diehotomously scored items 

van Krimpen-Stoop&Meijer (1999c). . , These 1 000 values of 

For each item score patmm, 1. was determmed using 9«. These 1, 

• ■ 1 dictrihution of I in each dataset. 0 was estimated using 

constituted the empincal distribution 1. m 

die maximum likelihood procedure proposed by Masters (19 ). 

distributions, the empirical Type I errors were detennmed as the percen ag^ 

patmTUSthatobumedavalueofthestatisticbelowdiecnticalvalueofthes^^^^^^^ 

distribution at one-sided significance level a = .005, .01, .016, .0 

fnst three momentsofthesimulated distributions ofi.wereeomputed and compared 
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the moments of the standard normal distribution. 



CAT 

Three CATs were constructed consisting of 10, 20, or 30 items. For all CATs, six datasets 
of 1,000 adaptive item score patterns were simulated. Five datasets were simulated at 
five 6 levels: 6 = -2,-1, 0, 1, and 2. For the sixth dataset, 1, 000 0s were drawn from 
iV(0;l). 

The item selection criterion that was used was maximum item information where the 
item information function of an m-step PCM item is defined as (Samejima, 1969) 









j=0 



m 






m 






2 



Maximum likelihood estimation (Masters, 1982) was used to estimate 9, and the fixed 
stepsize procedure with stepsize equal to 0.5 was used until item scores in two different 
categories were obtained. A fixed test length stopping rule was used, where final test 
length N was set to 10, 20, or 30. 

For each item score pattern, the empirical distribution of U was determined similar to 
the procedure for P&P tests. For all simulated distributions, the empirical Type I errors 
were determined and the first three moments of the simulated distributions of Iz were 
computed as described above. 

Also, for each dataset and each simulee, statistics and were computed in the 
CUSUM procedure described in Equations 5 through 7, where three different 6 values 
were used to determine E (Ai|0): the value of true 9, the value of the final 9 estimate, 
0JV, and the updated 9 estimate, 9k-\- For each simulee. 



maxC'" = max(G'^) and 
min C~ = mm (C^ ) 

were determined, resulting in 1 ,000 values of max C"*" and min C~ for each statistic and 
each dataset. Then, for each dataset and for both statistics, the upper bound, UB, was 
determined as the value of max C"*" for which 2.5% of the simulees had higher max C"*"- 
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values and the lower bound, LB, was determined as the value of min C~ for which 2.5% 
of the simulees had lower min C"- values. That is, a two-sided test at a < 0.05 was 
conducted, where P(maxC'*' > UB) = P(minC~ < LB) = 0.025. So, for each 
dataset two boimds (the upper and lower bounds) were determined for both and T^. 

Results 

Empirical Distribution of Iz 

In Tables 2 and 3 the first three moments of the empirical distributions of Iz , and 
the empirical Type I errors at five levels of (one-sided) a are given for the P&P tests and 
the 20- and 30-item CATs, respectively. 

Table 2 (P&P tests) shows that the mean of Iz was slightly larger than expected imder 
the standard normal distribution, for all datasets and both test lengths. The variance of Iz 
was close to 1 as expected imder the standard normal distribution, for most datasets and 
tests, provided that 6 Furthermore, the skewness of the distribution of Iz was found 
to be negative for most datasets; for the 20-item P&P test and 6 = —2 the skewness was 
positive (.901). However, the empirical Type I errors were close to the nominal ones, for 
most datasets and tests. For the 20-item test, the empirical Type I errors were somewhat 
smaller than the nominal ones, whereas for the 30-item test, the empirical error rates were 
slightly larger than the nominal error rates. 

Table 3 (CAT) shows that, for all datasets and both CATs, the mean and variance of k 
were found to be deviant firom 0 and 1, respectively. On average (across all datasets and 
both CATs), the mean and variance were .21 and .76, respectively (not tabulated). Also, 
the skewness of Iz was negative for all datasets and both CATs. Although the first three 
moments of the distribution were deviant firom expected, the empirical Type I errors were 
only slightly smaller than the nominal ones for both CATs and all datasets. This might 
be explained by the negative skewness. As a result, the person-fit statistics were only 
slightly conservative in classifying misfitting item score patterns as aberrant. 

Critical Uilues of CUSUM 

In Tables 4 and 5 the numerical values of UB and LB of the CUSUM procedure using 
statistic and T^, respectively, are given for the 10-, 20-, and 30- item CATs. 
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Table 4 shows, that using true 9 to calculate T\ for all CATs and all datasets, the 
values of UB and LB were almost symmetrical around 0 and similar across different 9 
values. Moreover, when true 9 was used, the numerical values of UB and LB obtained 
from the datasets 9 ~ (0, 1) approximated the bounds of the datasets with fixed 9 

values. When 9n was used to determine T\ the numerical values of UB and LB were 
asymmetric around 0 and differed across 9 values. However, for all CATs, the values of 
U B and LB obtained using 0 ~ W (0, 1) and 9n were similar to those obtained when true 
9 was used, probably due to the fact that the value of was close that of 9. When 9k-i 
was used to determine T\ the values of UB and LB were found to be symmetrical around 
0, but different across 9. But, again, for all CATs, the values of UB and LB obtained from 
the dataset 0 ~ W (0, 1) using 9k-i were close to those obtained when true 9 was used, 
probably due to accurate estimation of 9. 

The results for statistic in Table 5 show that, when true 9 was used, the values of 
UB and LB were asymmetric around 0 and differed across 9 values, for all CATs and all 
datasets. Furthermore, using 9, the bounds obtained from the dataset 9 ~ N {0,1) were 
quite different from those obtained in the datasets with fixed 9- For both using 9^ or 9k-i, 
the numerical values of the bounds were asymmetric and differed across 9 values for all 
CATs and all datasets. 



Discussion 

In this study, the empirical distribution of an existing person-fit statistic for polytomously 
scored items, l^, was investigated. It was shown that, although the first three moments of 
the empirical distribution of Iz were slightly deviant from the expected values under the 
standard normal distribution, the empirical Type I errors were close to the nominal ones, 
for most datasets and most P&P tests. For CATs, the first three moments of the empirical 
distribution were more deviant from those of the standard normal distribution than for 
P&P tests. However, the empirical Type I errors were slightly smaller than the nominal 
Type I errors. Therefore, Iz is a slightly conservative person-fit statistic when the critical 
values of the standard normal distribution are used, for both P&P tests and CATs with 
partial credit items. 

Interesting was that the results of the empirical distribution of Iz differed from the 
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results found in van Krimpen-Stoop 8c Meijer (1999c) where P«&P tests and CATs with 
dichotomous items were used. The better fit between the empirical and theoretical 
distribution for short tests in this study, especially the finding that the empirical Type I 
errors were close to the nominal error rates, may be explained by realizing that each pair of 
adjacent categories of the polytomous item can be seen as a single dichotomous item. This 
effect is comparable with using longer tests, which also results (e.g., van Krimpen-Stoop 
& Meijer, 1999c) in higher agreement between empirical and theoretical distributions. 

Also, the use of two CUSUM procedures for the assessment of person fit in 
polytorhous CAT was explored: the numerical values of the critical values were 
investigated. It was shown that the critical values of the CUSUM procedure using statistic 

were symmetric around 0. Moreover, determining bounds using 0 ~ (0, 1) were 

largely in agreement with the bounds for different values of or 6k-i and when true 6 
was used. The CUSUM using statistic was found to be less stable than the CUSUM 
using T^. Even when the true value of 6 was used to determine T^, thus for the null 
model of fitting response behavior, the critical values were asymmetric around 0 and were 
different across 0 values. Therefore, it is recommended to perform person-fit analysis 
with the CUSUM procedure using statistic and not T^. The UB and LB obtained 
from the 0 ~ (0, 1) dataset can be used as critical values at significance level a = .05. 

In the case of examinees with 6 values in the tails of the distribution (i.e. 6 = ±2), the 
classification of score patterns as either fitting or nonfitting may be slightly conservative: 
the empirical Type I error rate tends to be slightly smaller than the nominal Type I error 
rate for examinees with 6 = ±2, thus, slightly less than expected fitting score patterns 
are classified as misfitting. 

A disadvantage of the CUSUM procedure is that critical values have to be determined 
by means of simulations, which may sometimes be difficult to realize for different item 
pools and different test lengths. However, an advantage of the CUSUM procedure 
compared to is that it is possible to investigate the fit of an individual item score pattern 
during test administration. Also, by examining the graphical plot of the CUSUM, that 
is the plot of the values of C"*" and against the stage of the CAT, it is possible to 
track ”where-it-went-wrong”. Suppose, for example, the situation of an examinee who is 
imfamiliar with the use of a computer, and during administration he/she becomes familiar 
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with it. Then, it is plausible that the CUSUM passes the lower bound about halfway 
of the CAT, and reaches a stable level after the examinee is getting more familiar with 
the computer. On the other hand, when an examinee has preknowledge of a number of 
difficult items, the CUSUM may pass the upper bound after the response to these items. 

This study furthermore showed that for simulees with high or low 6 values, it is 
difficult to classify an item score pattern as fitting or misfitting solely on the basis of 
the outcome of a person-fit statistic as Iz or the CUSUM procedure because it is difficult 
to identify proper critical values of a statistic for these simulees. This is not only the case 
for polytomously scored CATs and P&P tests but also for dichotomously scored tests (see 
e.g., van Krimpen-Stoop & Meijer, 1999c, 1999b). 

Author Note 

This study received funding from the Law School Admission Council (LSAC). The 
opinions and conclusions contained in this report are those of the authors and not 
necessarily reflect the position or policy of LSAC. 
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Table 1 

Item Parameter Values of the Item Pool From Koch & Dodd (1989) 



Item 

Number 


«5i 


^2 


^3 


Item 

Number 


<5i 


^2 


^3 


1 


- 0.50 


0.00 


0.50 


31 


- 1.00 


0.00 


1.00 


2 


- 0.35 


0.00 


0.35 


32 


- 1.35 


0.00 


1.35 


3 


- 0.75 


0.00 


0.75 


33 


- 1.25 


0.00 


1.25 


4 


0.00 


- 0.75 


0.75 


34 


0.00 


- 1.25 


1.25 


5 


- 0.75 


0.75 


0.00 


35 


- 1.25 


1.25 


0.00 


6 


- 0.50 


0.00 


0.50 


36 


- 1.00 


0.00 


1.00 


7 


- 0.35 


0.00 


0.35 


37 


- 1.35 


0.00 


- 1.35 


8 


- 0.75 


0.00 


. 0.75 


38 


- 1.25 


0.00 


1.25 


9 


0.00 


- 0.75 


0.75 


39 


0.00 


- 1.25 


1.25 


10 


- 0.75 


0.75 


0.00 


40 


- 1.25 


1.25 


0.00 


11 


1.30 


1.80 


2.30 


41 


0.50 


1.50 


2.50 


12 


0.80 


1.55 


2.30 


42 


0.50 


<- 1.75 


2.50 


13 


1.30 


1.60 


2.30 


43 


0.70 


2.00 


2.70 


14 


1.30 


1.90 


2.30 


44 


0.80 


1.90 


2.50 


15 


1.00 


1.40 


2.00 


45 


0.80 


1.40 


2.50 


16 


0.50 


0.90 


1.50 


46 


0.50 


0.90 


2.50 


17 


1.55 


0.80 


2.30 


47 


1.75 


0.50 


2.50 


18 


0.80 


2.30 


1.55 


48 


■ 0.50 


2.50 


1.75 


19 


1.40 


1.00 


2.00 


49 


1.40 


0.80 


2.50 


20 


1.00 


2.00 


1.40 


50 


0.80 


2.50 


1.40 


21 


- 2.30 


- 1.80 


- 1.30 


51 


- 2.50 


- 1.50 


- 0.50 


22 


- 2.30 


- 1.55 


- 0.80 


52 


- 2.50 


- 1.75 


- 0.50 


23 


- 2.30 


- 1.60 


- 1.30 


53 


- 2.70 


- 2.00 


- 0.70 


24 


- 2.03 


- 1.90 


- 1.30 


54 


- 2.50 


- 1.90 


- 0.80 


25 


1 

to 

o 

o 


- 1.40 


- 1.00 


55 


- 2.50 


- 1.40 


1 

o 

00 

o 


26 


- 1.50 


1 

o 

o 


- 0.50 


56 


- 2.50 


- 0.90 


- 0.50 


27 


- 2.30 


- 0.80 


- 1.55 


57 


- 2.50 


- 0.50 


- 1.75 


28 


- 1.55 


- 2.30 


- 0.80 


58 


- 1.75 


- 2.50 


- 0.50 


29 


- 2.00 


- 1.00 


- 1.40 


59 


- 2.50 


- 0.80 


- 1.40 


30 


- 1.40 


- 2.00 


- 1.00 


60 


- 1.40 


- 2.50 


- 0.80 
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Table 2 

Mean (M) , Variance (V); Skewness jS) ; and Type I Errors of the Simulated 
Distributions of l^^pol (Using 0 ) for the 20- and 30-item P&P Tests 



Test 
and 6 


M 


V 






Type 


I Errors 




S 


.005 


.010 


.015 


.002 


.025 


20-item 


P&P Test 














N{0, 1) 


.058 


.639 


-.496 


.005 


.008 


.010 


.012 


.015 


o 

CM 

1 

M 


.133 


.249 


.901 


.000 


.000 


.000 


.000 


.000 


-1.0 


.042 


.260 


-.609 


.000 


.001 


.002 


.003 


.003 


o 

o 


.105 


.763 


-.568 


.006 


.008 


.012 


.017 


.022 


1.0 


.088 


.908, 


-.564 


.011 


.016 


.020 


.026 


.028 


to 

o 


. 087 


.485 


-.374 


.000 


.002 


.003 


.004 


.005 


30-item 


P&P Test 














N(0,1) 


.117 


. 927 


-.592 


.010 


.016 


.021 


.025 


.028 


o 

CM 

\ 

II 


.002 


.574 


-.677 


.002 


.012 


. 014 


.016 


. 017 


-1.0 


.100 


.838 


-.473 


.008 


.010 


.012 


.015 


.021 


o 

o 


.058 


1.037 


-.597 


.015 


.022 


.028 


.031 


.035 


1.0 


.098 


.839 


-.480 


.010 


.013 


.014 


.017 


.020 


to 

o 


.021 


.536 


-.463 


.001 


.001 


.004 


.007 


.007 
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Table 3 

Mean (M) , Variance (V) , Skewness (S) , and Type I Errors of the 
Simulated Distributions of l^^pol (Using 6 ) for 20- and 30-item CATs 



Test 
and 6 


M 


V 






Type 


I Errors 




S 


.005 


.010 


.015 


.020 


.025 


20 -item 


CAT 
















N( 0 , 1 ) 


.317 


.795 


-.188 


.002 


.003 


.006 


.006 


.007 


9 =- 2.0 


. 078 


.601 


-.086 


.001 


.002 


.004 


.005 


.008 


- 1.0 


.390 


.783 


-.304 


.001 


.003 


.003 


.004 


.005 


O 

o 


.247 


.800 


-.132 


.000 


.003 


.006 


.007 


.008 


1.0 


.345 


.762 


-.440 


.003 


.003 


.007 


.008 


.011 


2.0 


. 067 


.744 


-.408 


.005 


.007 


.010 


.015 


.016 


30 -item 


CAT 
















N( 0 , 1 ) 


.195 


.865 


-.325 


.005 


.006 


. 013 ’ 


.015 


.018 


9 =- 2.0 


.079 


.576 


-.444 


.002 


. 007 


.007 


.008 


.009 


- 1.0 


.225 


.899 


-.518 


.007 


. 012 


.014 


.017 


.018 


O 

O 


.271 


.838 


-.233 


.003 


. 007 


.008 


.009 


.012 


1.0 


.182 


.941 


-.435 


.005 


.009 


.013 


.015 


.016 


2.0 


.094 


.568 


-.346 


.001 


.001 


.004 


.006 


.008 
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Table 4 

Boundaries of CUSUM Using for 10-, 20- , and 
30-item CATs at a =.05 (two-sided) 



Test 




e 


e 


N 


Ok- 


-1 


and 6 


UB 


LB 


UB 


LB 


UB 


LB 


10-item 


CAT 












N{0, 1) 


.68 


-.63 


.56 


-.61 


.84 


-.85 


II 

1 

to 


.60 


-.58 


.65 


-.24 


.81 


- .70 


-1 


.62 


-.60 


.58 


-.39 


1.00 


- .77 


0 


.64 


-.63 


.33 


-.56 


.67 


-.86 


1 


.67 


-.65 


.49 


-.69 


.49 


-.92 . 


2 


.60 


-.69 


.27 


-.61 


.66 


-.73 


20-item 


CAT 












N(0, 1) 


.48 


-.45 


.50 


-.60 


.63 


-.76 


e=-2 


.47 


-.44 


.60 


-.13 


.78 


-.34 


-1 


.46 


-.51 


.45 


-.28 


.73 


-.51 


0 


.45 


-.46 


.22 


-.51 


.45 


-.79 


1 


.44 


-.47 


.17 


-.63 


.29 


-.83 


2 


.43 


-.46 


.16 


-.60 


.40 


-.70 


30 -item 


CAT 












N(0, 1) 


.37 


-.36 


.41 


-.53 


.49 


-.69 


e=-2 


.34 


-.33 


.47 


-.11 


.60 


-.29 


-1 


.37 


-.35 


.16 


-.50 


.32 


- .73 


0 


.36 


-.37 


.32 


-.24 


.57 


-.43 


1 


.38 


-.35 


.12 


-.57 


.18 


-.75 


2 


.32 


-.34 


.11 


-.50 


.22 


-.57 
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Table 5 

Boundaries of CUSUM Using for 10- , 20- , and 
30-item CATs at a =.05 (two-sided) 



Test 

and 9 


e 


On 


h- 


-1 


UB 


LB 


UB 


LB 


UB 


LB 


10-item 


CAT 












N(0,1) 


.78 


-.87 


.67 


-.67 


.93 


-.91 


e=-2 


1.23 


-.29 


.86 


-.26 


1.26 


-.65 


-1 


.76 


-.49 


.55 


-.39 


1.01 


-.77 


0 


.45 


-.81 


.34 


-.62 


.63 


-.92 


1 


.38 


-.98 


.30 


- .76 


.50 


-1.01 


2 


.41 


-1.17 


.32 


-.68 


.71 


-.83 


20-item 


CAT 












N(0, 1) 


.59 


-.84 


.54 


-.66 


. 94 


-.84 


(N 

1 

II 


1.06 


-.17 


.79 


- . 14 


1.12 


-.32 


-1 


.54 


-.42 


.42 


-.28 


.80 


-.51 


0 


.27 


-.70 


.23 


-.56 


.43 


-.85 


1 


.20 


-.88 


.18 


-.70 


.30 


-.91 


2 


.22 


-.96 


.19 


- .68 


.41 


-.83 


3 0-item 


CAT 












N(0, 1) 


.54 


-.78 


.51 


-.61 


.60 


- .78 


(N 

1 

II 


.86 


-.15 


.68 


- . 14 


1.00 


-.30 


-1 


.19 


-.63 


. 16 


-.57 


.32 


-.82 


0 


.40 


-.34 


.32 


-.24 


.61 


-.44 


1 


.15 


-.84 


.13 


-.65 


.19 


-.85 


2 


.15 


-.81 


.13 


-.60 


.23 


-.74 




best copy available 



28 



Titles of Recent Research Reports from the Department of 
Educational Measurement and Data Analysis. 
University of Twente, Enschede, The Netherlands. 



RR-00-01 

RR-99-08 

RR-99-07 

RR-99-06 

RR-99-05 

RR-99-04 

RR-99-03 

RR-99-02 

RR-99-01 

RR-98-16 

RR-98-15 

RR-98-14 

RR-98-13 

RR-98-12 

RR-98-11 

RR-98-10 

RR-98-09 

RR-98-08 

RR-98-07 



E.M.L.A. van Krimpen-Stoop & R.R. Meijer, Detection of Person Misfit in 
Computerized Adaptive Tests with Polytomous Items 

WJ. van der Linden & J.E. Carlson, Calculating Balanced Incomplete Block 
Designs for Educational Assessments 

N.D. Verhelst & F. Kaftandjieva, A Rational Method to Determine Cutoff 
Scores 

G. van Engelenburg, Statistical Analysis for the Solomon Four-Group Design 
E.M.L.A. van Krimpen-Stoop & R.R. Meijer, CUSUM-Based Person-Fit 
Statistics for Adaptive Testing 

H. J. Vos, A Minimax Procedure in the Context of Sequential Mastery Testing 

B. P. Veldkamp & W.J. van der Linden, Designing Item Pools for 
Computerized Adaptive Testing 

W.J. van der Linden, Adaptive Testing with Equated Number-Correct Scoring 
R.R. Meijer & K. Sijtsma, A Review of Methods for Evaluating the Fit of Item 
Score Patterns on a Test 

J.P. Fox & C.A.W. Glas, Multi-level IRT with Measurement Error in the 
Predictor Variables 

C. A.W. Glas & H.J. Vos, Adaptive Mastery Testing Using the Rasch Model 
and Bayesian Sequential Decision Theory 

A. A. B6guin & C.A.W. Glas, MCMC Estimation of Multidimensional IRT 
Models 

E.M.L.A. van Krimpen-Stoop & R.R. Meijer, Person Fit based on Statistical 

Process Control in an AdaptiveTesting Environment 

W.J. van der Linden, Optimal Assembly of Tests with Item Sets 

W.J. van der Linden, B.P. Veldkamp & L.M. Reese, An Integer Programming 

Approach to Item Pool Design 

W.J. van der Linden, A Discussion of Some Methodological Issues in 
International Assessments 

B. P. Veldkamp, Mw/rip/e Objective Test Assembly Problems 

B.P. Veldkamp, Multidimensional Test Assembly Based on Lagrangian 
Relaxation Techniques 

W.J. van der Linden & C.A.W. Glas, Capitalization on Item Calibration Error 
in Adaptive Testing 



29 



RR-98-06 

RR-98-05 

RR-98-04 

RR-98-03 

RR-98-02 

RR-98-01 

RR-97-07 

RR-97-06 

RR-97-05 

RR-97-04 

RR-97-03 

RR-97-02 

RR-97-01 

RR-96-04 

RR-96-03 

RR-96-02 

RR-96-01 

RR-95-03 



WJ. van der Linden, D.J. Scrams & D.L.Schnipke, Using Response-Time 
Constraints in Item Selection to Control for Differential Speededness in 
Computerized Adaptive Testing 

W.J. van der Linden, Optimal Assembly of Educational and Psychological 
Tests, with a Bibliography 

C.A.W. Glas, Modification Indices for the 2-PL and the Nominal Response 
Model 

C.A.W. Glas, Quality Control of On-line Calibration in Computerized 
Assessment 

R.R. Meijer & E.M.L.A. van Krimpen-Stoop, Simulating the Null Distribution 
of Person-Fit Statistics for Conventional and Adaptive Tests 
C.A.W. Glas, R.R. Meijer, E.M.L.A. van Krimpen-Stoop, Statistical Tests for 
Person Misfit in Computerized Adaptive Testing 

H.J. Vos, A Minimax Sequential Procedure in the Context of Computerized 
Adaptive Mastery Testing 

H.J. Vos, Applications of Bayesian Decision Theory to Sequential Mastery 
Testing 

W.J. van der Linden & Richard M. Luecht, Observed-Score Equating as a Test 
Assembly Problem 

W.J. van der Linden & J.J. Adema, Simultaneous Assembly of Multiple Test 
Forms 

W.J. van der Linden, Multidimensional Adaptive Testing with a Minimum 
Error-Variance Criterion 

W.J. van der Linden, A Procedure for Empirical Initialization of Adaptive 
Testing Algorithms 

W.J. van der Linden & Lynda M. Reese, A Model for Optimal Constrained 
Adaptive Testing 

C.A.W. Glas & A.A. Beguin, Appropriateness ofIRT Observed Score Equating 
C.A.W. Glas, Testing the Generalized Partial Credit Model 
C.A.W. Glas, Detection of Differential Item Functioning using Lagrange 
Multiplier Tests 

W.J. van der Linden, Bayesian Item Selection Criteria for Adaptive Testing 
W.J. van der Linden, Assembling Tests for the Measurement of Multiple 
Abilities 



Research Reports can be obtained at costs. Faculty of Educational Science and Technology, 
University of Twente, TO/OMD, P.O. Box 217, 7500 AE Enschede, The Netherlands. 



30 



444 



□ 




I- 

4' 4 -t -4 

f 1 
4 ^ t 

^ p <r ^ 




□ 



4 ? 

f i 




»- 4^ 4 4 














V . ^ 




:> f ■ 




m.- 




p 


1 


V. *’i‘ . 




■■ ■ n 






f- 


■ 





faculty of 

EDUCATIONAL SCIENCE 
AND TECHNOLOGY 



A publication by 

The Faculty oL|duia|ional44e^^^^^ and Technology of the University of Twente 
RO. Box 21 7 1 i t.#-4 
7500 AE Ens hdde; ^ ^ 

The Netherlands 1 . , . 



31 







U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 
Educational Resources Information Center (ERIC) 




TIV1032316 



NOTTCE 

ttFPROmirTTON BASIS 




This document is covered by a signed “Reproduction Release 
(Blanket) form (on file within the ERIC system), encompassing all 
or classes of documents from its source organization and, therefore, 
does not require a “Specific Document” Release form. 



□ 



This document is Federally-funded, or carries its own permission to 
reproduce, or is otherwise in the public domain and, therefore, may 
be reproduced by ERIC without a signed Reproduction Release form 
(either “Specific Document” or “Blanket”). 




EFF-089 (9/97) 




